Mapping the Diffusion of Information Among Major U.S. Research Institutions
نویسندگان
چکیده
This paper reports the results of a large scale data analysis that aims to identify the information production and consumption among top research institutions in the United States. A 20-year publication data set was analyzed to identify the 500 most cited research institutions and spatiotemporal changes in their inter-citation patterns. A novel approach to analyzing the dual role of institutions as information producers and consumers and to study the diffusion of information among them is introduced. A geographic visualization metaphor is used to visually depict the production and consumption of knowledge. The highest producers and their consumers as well as the highest consumers and their producers are identified and mapped. Surprisingly, the introduction of the Internet does not seem to affect the distance over which information diffuses as manifested by citation links. The citation linkages between institutions fall off with the distance between them, and there is a strong linear relationship between the log of the citation counts and the log of the distance. The paper concludes with a discussion of these results and an outlook for future work. This is a revised and extended version of a paper that was originally presented at the 10th International Conference of the International Society for Scientometrics and Informetrics in Stockholm, July 2005 . Introduction Does space still matter in the Internet age? Does one still have to study and work at a major research institution in order to have access to high quality data and expertise, to produce high quality research, and to diffuse results effectively? To answer these questions, an interdisciplinary publication data set covering the years from 19822001 was analyzed to identify the 500 most cited research institutions in the United States and spatial changes in their inter-citation patterns. Advanced data analysis and visualization techniques were applied to determine information sources and sinks and the diffusion patterns among them. The results of our analysis are surprising in that the increasing usage of the Internet does not seem to lead to more global citation patterns. In particular, the distance over which information diffuses as manifested by citation links does not increase over time. The remainder of the paper is organized as follows: Section 2 reviews related work and contrasts it with our approach; Section 3 describes the data set used in this analysis and how it was processed; Visualizations of the data analysis results are presented in section 4; Section 5 concludes the paper with a discussion of results and future work. Related Work and Our Approach The diffusion of tangible objects (people, goods, etc.) but also of intangible objects (ideas, activity levels, etc) has been studied in diverse fields of science including physics, e.g., heat diffusion; Börner, Katy, Penumarthy, Shashikant, Meiss, Mark and Ke, Weimao. Mapping the Diffusion of Scholarly Knowledge Among Major U.S. Research Institutions. Accepted for Scientometrics. Dedicated issue on the 10th International Conference of the International Society for Scientometrics and Informetrics held in Stockholm. robotics, e.g., communication among mobile robots ; social network analysis 3, ; bibliometrics/scientometrics/webometrics 5, , geography, e.g., migration studies ; and biology, e.g., neuronal migration in the nervous system . Other studies have attempted to judge the research vitality or quality of research conducted at specific research institutions. Diverse activity, impact, and linkage measures exist and can be applied to quantify the research contribution of institutions . However, very few citation studies have attempted to analyze the geographical concentration of highly cited authors, institutions, countries. Batty’s 12 work is an exception and it nicely shows that the distribution of citation counts is highly skewed, with most citations being associated with a few individuals working at a small number of institutions in an even smaller number of places and countries. Here, we are interested to study the diffusion of scholarly knowledge. We assume that scholarly knowledge diffuses via co-authorships, the physical movement of authors through geographical space and the production (writing) and consumption (citing) of papers, among others. Unfortunately, the identification of unique author names is unresolved. Similarly, proper assignment of an author to his or her institution is often impossible due to the quality of available publication data. Our work goes beyond existing research in that we do not only examine the citation counts for each institution but attempt to (1) identify geographically and statistically significant instances of institutions that act as major information sources, (2) correlate their behavior as producers or information sources representing the number of citations their papers receive, consumers or information sinks based on the number of citations they make to papers produced at other institutions, and self-consumers reflecting the number of self citations, (3) use direct citation linkage to identify their interrelation based on the amount of directly exchanged information, and (4) analyze and visualize the importance of proximity in geographic space for information exchange. Subsequently, we formalize each institution as a node that acts as both: a source (or producer) of information as well as an information sink (or consumer). Arrows among institutions denote the flow of information. If a paper was published at institution A and is cited by a paper that is published at institution B, then there will be an arrow going from A to B. The more papers produced at A are cited by B, the higher the volume of information flow. Hence, the normalized out-degree of a node can be used to characterize the role of an institution as an information source. The normalized in-degree of a node describes the role of an institution as an information sink. Links which lead from an institution to itself correspond to self-citations. Note that this formalization could also be applied to authors, institutions, countries, etc. Data Set and Data Analysis The complete set of papers published in the Proceedings of the National Academy of Sciences (PNAS) in the years from 1982-2001 was analyzed to determine knowledge diffusion pathways among major institutions as manifested in paper citation linkages among the papers. The data set contains 47,073 papers published by 18,994 unique authors, who work at 2,822 institutions. Institutions comprise academic institutions, research labs and corporate entities. To be credited with an article, a given institution had to be the site of the first author listed on the paper. The paper most highly cited by papers within the set received 612 citations. Given our interest in exploring the importance of spatial proximity for the diffusion of information within U.S., we decided to analyze information diffusion patterns among major institutions, the spatial position of which is uniquely and persistently identified by their zip code and corresponding longitude and latitude coordinates. By ‘major institutions’, we refer to institutions that have acquired a high total number of citations for their papers. An initial data cleaning step was performed to remove suffixes such as INC, MED. These suffixes serve to indicate whether the entity in question is a corporate entity, a research lab or an academic institution. However, these suffixes are not consistent with respect to spacing between the name of the institution and the suffix, leading to string matching problems. Removing these suffixes helps to create uniformity of institution names in the data set. Börner, Katy, Penumarthy, Shashikant, Meiss, Mark and Ke, Weimao. Mapping the Diffusion of Scholarly Knowledge Among Major U.S. Research Institutions. Accepted for Scientometrics. Dedicated issue on the 10th International Conference of the International Society for Scientometrics and Informetrics held in Stockholm. Next, we had to decide what institutions should be merged. For example, an institution such as Indiana University has several campuses. Collapsing all these campuses into one entity causes valuable geographic information to be lost, since the campuses might be far apart. However, separating out each campus individually can result in extremely cluttered data. Another significant issue that arises out of separating different campuses of the same university is the distribution of the number of citations among those campuses. For example, Indiana University as a single entity might qualify to be in the top 500 most highly cited institution list, but when the campuses are split, none of the individual campuses might have the requisite number of citations to make it into this list. The zip code was used to preserve information about where two institutions with the same name, but with differing geographic locations, are located. The United States zip code assigns postal codes based on the position of a certain geographic location in a hierarchy of geographic significance based on area. Hence, in the 5-digit zip code, the first digit indicates which region of the U.S. the location belongs to such as northeast, southwest, etc. The next two digits indicate state and county information. The final two digits serve to distinguish finer boundaries such as towns and cities within a county. A unique ID was created for each institution by concatenating the (abbreviated) name of the institution with its zip code. As this system is unique to the United States, non-U.S. institutions, such as University of Tokyo (1,797 citations), despite producing highly cited publications, were excluded from the analysis presented in this paper. We then proceeded to determine the level of geographic resolution that is significant for answering our question. Given that universities typically do not have two major campuses in one county we decided to use the county as our smallest unit. Hence, for each institution, all its campuses or instances that lay within the same county were collapsed into one entity. In zip code terms, this meant merging all instances of an institution whose zip codes differed only in the last two digits. The newly created identity of the institution consisted of a concatenation of the (abbreviated) name with the smallest zip code within that county. For example, INDIANA UNIV47401 and INDIANA UNIV47405 were collapsed into INDIANA UNIV47401. Collapsing universities in this manner provides a good compromise between maintaining geographic identity and statistical significance. Subsequently, the top 500 most highly cited institutions were identified. The top 500 institutions produced 30,572 (64.95%) of all papers and received 195,889 (51.83%) of a total of 377,935 citations. A graph showing the number of listed references, received citations, and self citations over the alphabetically sorted list of institutions is given in Figure 1. An offset was applied to citation counts to improve readability.
منابع مشابه
Feminism and Abortion in the United States’ Party Politics
Abstract The feminist movement in the United States like other countries has tried to establish equality for women. From the first attempts to gain constitutional right for vote, up to the current radical demands, feminists have struggled to make changes in the U.S. party politics and obtain their rights within the parties. One of the important issues in which women played a key role in party ...
متن کاملSpatio - Temporal Information Production and Consumption of Major U . S . Research Institutions
This paper reports the results of a large scale data analysis that aims to identify the information production and consumption among top research institutions in the United States. A 20-year publication data set was analyzed to identify the 500 most cited research institutions and spatio-temporal changes in their inter-citation patterns. A novel approach to analyzing the dual role of institutio...
متن کاملChikungunya Disease Awareness Among U.S. Travelers to Caribbean Destinations
Introduction: This study investigated chikungunya disease awareness and its predictors, the level of adoption of recommended personal protective behaviors against chikungunya, and the health information-seeking behavior of U.S. travelers to Caribbean destinations.Methods: A cross-sectional retrospective online survey of 653 adult U.S. international travel...
متن کاملMapping of Agricultural Information Flows for Yam Minisett Technology in Delta State, Nigeria
ABSTRACTThis study examined information flow on minisett technology among yam farmers in Delta State, Nigeria. A sample size of 180 respondents was involved in the study. Data were obtained from respondents of the study through the use of a validated interview schedule. Percentage, frequency count and mean scores were used to summarize data, while line diagrams were used to develop maps of info...
متن کاملThe Benefits and implementations of Diffusion tensor imaging and Neural Fiber Tractography in Brain Surgery
Background and Aim: The methods for detecting brain activation with fMRI, MRI provides a way to measure the anatomical connections which enable lightning-fast communication among neurons that specialize in different kinds of brain functions. Diffusion tensor imaging is able to measure the direction of bundles of the axonal fibers which are all aligned. Besides mapping white matter fiber tracts,...
متن کامل